sdselect command usage

The sdselect command is a report-generating command. See Command types in the Splunk Cloud Platform Search Reference.

Generating commands use a leading pipe character and must be the first command in a search.

When you use sdselect to search an AWS Glue Data Catalog table, you must use a FROM clause to reference a federated index that maps to that AWS Glue table. This mapping lets sdselect run an efficient statistical analysis of fields in that AWS Glue table. The AWS Glue table, in turn, represents a dataset in your Amazon S3 storage.

The sdselect command supports path navigation syntax for nested fields representing specific datasets within a JSON array or Amazon S3 directory structure. See Use nested fields to identify datasets in a hierarchical structure.

The "sd" in sdselect stands for "structured data".

sdselect searches consume your data scan entitlement

When you run sdselect searches, Splunk software counts the volume of data on disk that your search scans against your total data scan entitlement. To reduce data scan entitlement consumption, construct your searches so that they do not scan more data than necessary.

Splunk software tracks the volume on disk of the data you are scanning, not the number of events that you are searching. A search of compressed data, such as data from GZIP or Parquet files, might consume less of your data scan entitlement than a similar search of uncompressed data.

For more information about obtaining and monitoring your Federated Search for Amazon S3 data scan entitlement, see About Federated Search for Amazon S3.

For more information about obtaining and monitoring your Federated Analytics data scan entitlement, see About Federated Analytics.

Federated Search for Amazon S3 and Federated Analytics count searches that you cancel against your data scan entitlement for the amount of data that has been scanned at the point of search cancellation. Failed searches do not incur charges against your data scan entitlement.

Data size limits for sdselect searches

The amount of AWS Glue table dataset data that the sdselect command can process in a single search has the following limitations.

The sum of a single row or column in a AWS Glue table dataset cannot exceed 32 MB. For example, you exceed this limit if a single column in a table row has a column that is 100 MB in size.
sdselect searches have a configurable data-scanned-per-search control limit of 10 TB. If you run an sdselect search that exceeds this limit, the search fails without incurring any consumption of your data scan entitlement. If you need to change the data-scanned-per-search limit for your sdselect searches, contact Splunk Support.

Use nested fields to identify datasets in a hierarchical structure

The sdselect command supports path navigation syntax for nested fields. Such fields begin with one or more struct type elements and end with a named field. Nested fields use dot characters ( . ) to separate the structure levels. You can use nested fields to indicate specific datasets within a JSON array or Amazon S3 directory structure. You can use these nested "dataset" fields in place of ordinary fields throughout sdselect searches.

For example, the nested field userIdentity.type contains possible type field values for the Amazon IAM userIdentity element. You might use it in the WHERE clause of an sdselect search like this:

| sdselect count(userIdentity.type) FROM federated:myindex WHERE userIdentity.type = "AssumedRole"

You might also have other nested fields for the userIdentity element, such as userIdentity.userName and userIdentity.accountId.

Nested fields cannot include special characters. For example, if you have a field named a.b.c where b.c is a flattened field, your sdselect search will fail with an error message.

If you use nested fields in an sdselect search, you cannot surround those nested fields by quotation marks. See Special handling for sdselect syntax elements.

Rename nested fields that have the same named field

When nested fields have the same named field but different preceding struct elements, you might run into issues when you use them together in an sdselect search. The sdselect command treats multiple nested fields with the same named field as if they are multiple fields with the same name. When sdselect finds multiple fields with the same name in a search string, it discards all of the fields with the same named field except for the last listed field.

For example, say you run the following search:

| sdselect netperf.asnum, network.asnum FROM federated:parquet LIMIT 100

The sdselect command handles the nested fields netperf.asnum and network.asnum as if they both are named asnum. The sdselect command returns results only for network.asnum, in a field titled asnum.

You can get around the problem of having multiple nested fields with the same named field by renaming such nested fields at search time. For example, the following search renames the nested fields so that 2 asnum fields appear in the results:

| sdselect netperf.asnum AS netperf_asnum, network.asnum AS network_asnum FROM federated:parquet LIMIT 100

Apply evaluation functions to your sdselect searches

You can apply evaluation functions to the following parts of your sdselect searches.

Fields you select with sdselect.
Aggregate statistical functions (applied to evaluation functions).
WHERE, GROUPBY, and ORDERBY clauses.

See sdselect command syntax details for more information about how evaluation functions work with these sdselect syntax elements.

In sdselect searches, an evaluation function is case-sensitive with regard to field values. The search head checks the syntax of the evaluation function before it runs the search and returns an error message if the expression is invalid.

The following table lists the evaluation functions that are generally supported by sdselect, organized by function type. Use the links in the table to see descriptions and examples of each type of function.

Type of function	Supported functions and syntax
Conversion functions	`tonumber(<str>)` `tostring(<value>)`
JSON functions	`json<value>` `json_extract(<json>,<path>)` `json_extract_exact(<json>,<string>)`
Text functions	`extract(<str>,<regular_expression>)` (unique to `sdselect`) `lower(<str>)` `replace(<str>,<regular_expression>,<replacement>)` `upper(<str>)`

See Evaluation functions in the Splunk Cloud Platform Search Reference.

The WHERE clause supports additional time, date, and Boolean evaluation functions. See WHERE clause arguments.

Nested fields, certain kinds of field names, and literal strings require special handling in sdselect evaluation expressions. See Special handling for sdselect syntax elements.

Support for evaluation functions that involve regular expressions

Evaluation functions that have regular expressions as arguments, such as extract() and replace(), require Java regular expression syntax when you use them in conjunction with sdselect.

The replace() evaluation function supports perl-compatible regular expression (PCRE) syntax when you use it with other SPL commands such as eval, fieldformat, and where.

The extract() evaluation function is unique to sdselect. It cannot be used with other SPL commands. See Evaluation functions specific to sdselect.

Support for tonumber() and tostring()

The sdselect command supports only the required arguments for the tonumber() and tostring() functions. For the tonumber() function, sdselect does not allow the <base> argument. The sdselect command processes all tonumber() string-to-number conversions in default base 10.

For the tostring() function, the sdselect command does not support the <format> argument. In sdselect searches, you can use tostring() only to facilitate straightforward number-to-string conversions.

Support for JSON evaluation functions

For sdselect searches, the json_extract() and json_extract_exact() evaluation functions behave as described in the JSON functions topic in the Search Reference, with the following restriction: In sdselect searches, json_extract() and json_extract_exact() require a <json> argument and a single <path> or <string> argument. In other words, the sdselect command does not support json_extract without a location path or with multiple location paths, and it does not support json_extract_exact without a string or with multiple strings.

In sdelect searches, the json() evaluation function has the additional ability to convert SQL arrays (including arrays made up of key-value pair maps) into JSON. See Use the json() evaluation function to convert SQL arrays into JSON.

Special handling for sdselect syntax elements

The sdselect command expects you to handle certain syntax elements in specific ways. For example, you must never surround nested fields with single or double quote characters. On the other hand, you must surround flattened fields which contain special characters or which begin with numbers by single quotes.

The following table describes elements of sdselect syntax that require special handling.

Syntax element	sdselect command identification rule	Example
Nested field names	Do not enclose names of nested fields with quotation marks. For more information about nested fields, see Use nested fields to identify datasets in a hierarchical structure.	`GROUPBY userIdentity.userName`
Flattened field names containing special characters	Enclose names of flattened fields that contain non-alphanumeric characters or spaces with single quotation marks ( ' ).	`GROUPBY 'user.name', host` This is an example of a flattened field that looks like a nested field. The single quotes prevent the `sdselect` processor from trying to treat the flattened field as a nested field.
Flattned field names that begin with numeric characters	Enclose names of flattened fields that start with numeric characters with single quotation marks ( ' ).	`'5minutes'="late"`
Literal strings	Enclose literal strings with double quotation marks ( " ).	`userName = "Gary"`
Location path arguments for `json_extract` evaluation functions	Enclose with double quotation marks ( " ).	`json_extract(responseparts, "RoleUser.RoleId")`

In sdselect searches, you do not need to enclose field names in quotation marks when:

The field name begins with an alphabetic character.
The field name does not contain special characters or spaces.

Use the json() evaluation function to convert SQL arrays into JSON

In addition to the standard functionality for the json() evaluation function, which evaluates whether a value can be parsed as JSON, when used in an sdselect search, json() can additionally convert SQL arrays into JSON, including SQL arrays made up of mapped key-value pairs.

For example, say your Amazon S3 dataset has SQL arrays in a people column that look like this:

people
[{last=Smith, first=Bob, age=40}, {last=Doe, first=Jane, age=30}, {last=Smith, first=Billy, age=8}]

You can write a federated search with sdselect that applies the json() evaluation function to the people column in your Amazon S3 dataset.

| sdselect json(people) AS peopleAsJSON FROM my_people_data

That sdselect search returns results that look something like this:

_time	people	peopleAsJSON
2025-01-26 21:10:45	[{last=Smith, first=Bob, age=40}, {last=Doe, first=Jane, age=30}, {last=Smith, first=Billy, age=8}]	[{"last":"Smith","first":"Bob","age":40},{"last":"Doe","first":"Jane","age":30},{"last":"Smith","first":"Billy","age":8}]

GROUPBY and ORDERBY event sort interoperation

If you use the GROUPBY clause and the ORDERBY clause in an sdselect search, the sdselect command sorts search results in the following sequence:

By the values of the fields or evaluation functions you specify in the ORDERBY clause, in the order in which you list the fields or evaluation functions. For fields or evaluation functions that have the ASC modifier or no modifier, the sdselect command sorts results by the field values in ascending order. For fields or evaluation functions that have the DESC modifier, the sdselect command sorts results by the field values in descending order.
In ascending order by the values of any GROUPBY fields that are not specified in the ORDERBY clause, in the order in which you list the fields.

If an sdselect search uses only the GROUPBY clause, sdselect sorts search results in ascending order by values of the fields and evaluation functions associated with the GROUPBY clause, following the list order of the fields and evaluation functions.

Control the maximum number of returned results if a LIMIT clause is not present

If the LIMIT clause is not present in an sdselect search string, the max_number_of_results setting in limits.conf determines the maximum number of results that the search can return. The max_number_of_results setting defaults to 100,000 results.

If you use Splunk Enterprise you can remove constraints on the number of results that sdselect searches can return by setting max_number_of_results to 0.

Unbounded sdselect searches over large datasets can be expensive in terms of the amount of your data scan entitlement they might consume.

If a LIMIT clause is present in an sdselect search, it overrides the max_number_of_results setting for that search.

Splunk Cloud Platform users who want to change the max_number_of_results setting must contact Splunk Support to do so.

Ensure consistent sdselect output when the number of matched results exceeds set limits

When an sdselect search matches a number of results that exceeds an explicit LIMIT clause in the search string or the implicit limit set by the max_number_of_results setting when a LIMIT clause is not present, it is possible that repeated runs of that search will return different result sets.

If you encounter this issue, use one of the following methods to resolve it:

Design the search so that it always matches a number of results that is smaller than the LIMIT clause or the default limit, if no LIMIT clause is present. For example, you might reduce the number of matched results by running the search over a smaller time range, or by designing a more restrictive WHERE clause.
Add an ORDERBY clause to the search. If the search has a LIMIT clause you must put the ORDERBY clause before the LIMIT clause.

Related answers from Splunk Community

sdselect command usage

sdselect searches consume your data scan entitlement

Data size limits for sdselect searches

Use nested fields to identify datasets in a hierarchical structure

Rename nested fields that have the same named field

Apply evaluation functions to your sdselect searches

Support for evaluation functions that involve regular expressions

Support for tonumber() and tostring()

Support for JSON evaluation functions

Special handling for sdselect syntax elements

Use the json() evaluation function to convert SQL arrays into JSON

GROUPBY and ORDERBY event sort interoperation

Control the maximum number of returned results if a LIMIT clause is not present

Ensure consistent sdselect output when the number of matched results exceeds set limits

See also

Comments

sdselect command usage

Was this topic useful?